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Abstract. We show how to represent an interval of real numbers in an ab- 
stract numeration system built on a language that is not necessarily regular. 
As an application, we consider representations of real numbers using the Dyck 
language. We also show that our framework can be applied to the rational 
, base numeration systems. 

o 
o 

1. Introduction 

In |LR02| , P. Lecomte and the third author showed how to represent an interval 
of real numbers in an abstract numeration system built on a regular language 
satisfying some suitable conditions. In this paper, we provide a wider framework 
I 1 ' and we show that their results can be extended to abstract numeration systems 

, built on a language that is not necessarily regular. Our aim is to provide a unified 

^ ' approach for the representation of real numbers in various numeration systems 

encountered in the literature [AFS081 lDT89l ILR011 lLot02] . 

This paper is organized as follows. In the second section, we recall some useful 
definitions and results from automata theory. In Section we restate the general 
framework of [LR02 . Then in Section [H we show that the infinite words obtained 
as limits of words of a language are exactly the infinite words having all their pre- 
fixes in the corresponding prefix closure. In view of this result, we shall consider 
, only abstract numeration systems built on a prefix-closed language to represent 

the reals. One can notice that usual numeration systems like integer bas systems, 
/3-numeration or substitutive numeration systems are all built on prefix-closed lan- 
' guages |DT89llL"ot02j . In Section [SJ we show how to represent an interval [s , 1] of 

real numbers in a generalized abstract numeration system built on a language sat- 
isfying some general hypotheses. Finally, in Section we give three applications of 
our methods, that were not settled yet by the results of [LR02] . First, we consider 
' a non-regular language L such that its prefix-language Pref(L) is regular. In a sec- 

ond part, we consider the representation of real numbers in the generalized abstract 
numeration system built on the language of the prefixes of the Dyck words. In this 
case, neither the Dyck language D nor its prefix-closure Pref(D) are recognized by 
a finite automaton. We compute the complexity functions of this language, i.e., for 
each word w, the function mapping an integer n onto Card(u> _1 -D n {a, b} n ), and 
we show that we can apply our results to the corresponding abstract numeration 
system. The third application that we consider is the abstract numeration system 
built on the language Lz recently introduced in [AFS08] . We show that our method 
leads, up to some scaling factor, to the same representation of the reals as the one 
given in [AFS08j . 

2. Preliminaries 

Let us recall some usual definitions. For more details, see for instance |Eil74] or 
|Sak03] . An alphabet is a non-empty finite set of symbols, called letters. A word 
over an alphabet £ is a finite or infinite sequence of letters in E. The empty word 
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is denoted by e. The set of finite (resp. infinite) words over E is denoted by E* 
(resp. E w ). The set E* is the free monoid generated by E with respect to the 
concatenation product of words and with e as neutral element. A language (resp. 
w-language) over E is a subset of E* (resp. E"). If w is a finite word over E, the 
length of w, denoted by |tu|, is the number of its letters and if a G E, then |tu| is 
the number of occurrences of a in w. If w is a finite (resp. infinite) word over E, 
then for all i G [0, |io| — I] (resp. i G N), denotes its (i + l)st letter, for all 
< i < j < — 1 (resp. < i < j), the factor w[i, j] of w is the word w[i] ■ ■ ■ w\j\, 
and for all i G [0, \w\J (resp. i SN), w[0, i — 1] is the prefix of length i of w, where 
we set w[0, —1] := e. The set of prefixes of a word w (resp. a language L) is denoted 
by Pref(w) (resp. Pref(L)). Notice that indices are counted from 0. 

One can endow E" U E* with a metric space structure as follows. If x and y are 
two distinct infinite words over E, define the distance d over E w by d(x,y) := 2~ e 
where I = mi{i G N | x[i] ^ y[i}} is the length of the maximal common prefix 
between x and y. We set d(x, x) = for all x G E w . This distance can be extended 
to E" U E* by replacing the finite words z by where # is a new letter not in 

E. A sequence (w^) n >o of words over E converges to an infinite word w over E if 
d(ti>(") ,w)^0asm +oo. 

A deterministic (finite or infinite) automaton over an alphabet E is is a directed 
graph A = (Q,qo,T,,5,F), where Q is the set of states, q is the initial state, 
F C Q is the set of /maZ states and (5 : Q x E — > Q is the transition function. 
The transition function can be naturally extended to Q x E* by s) = q and 
5{q,aw) — 5{5{q,a),w) for all q e Q, a e E and w e E*. We often use g • w as 
shorthand for S(q,w). A state q € Q is accessible (resp. coaccessible) if there exists 
a word w € E* such that (5(g , w) = 5 (resp. 5(q, w) e f 1 ) and ^4 is accessible (resp. 
coaccessible) if all its state are accessible (resp. coaccessible). A word w £ E* is 
accepted by A if 5(qo,w) G F. The set of accepted words is the language recognized 
by A. A deterministic automaton is said to be finite (resp. infinite) if its set of 
states is finite (resp. infinite). A language is regular if it is recognized by some 
deterministic finite automaton (DFA). 

Among all the deterministic automata recognizing a language, one can distin- 
guish the minimal automaton of this language, which is unique up to isomorphism 
and is defined as follows. The minimal automaton of a language L over an alpha- 
bet E is the deterministic automaton Al = (Ql, Qo,l, E, 5l, Fl) where the states 
are the sets = {x e E* \ wx G L}, for any w G E*, the initial state is 

qox = e^ 1 L = L, the final states are the sets w~ 1 L with w G L and the transition 
function 5l is defined by Sl(w^ 1 L, a) — (wa)^ 1 L for all w G E* and all a G E. By 
construction, Al is accessible and the set of accepted words is exactly L. It is well 
known that Al is finite if and only if L is regular. The trim minimal automaton of 
a language is the minimal automaton of this language from which the only possible 
sink state has been removed, i.e. we keep only the coaccessible states. In this case, 
the transition function can possibly be a partial function. 

If L is the language recognized by a deterministic automaton A = (Q, qo, E, S, F), 
L q := {w G E* | S(q,w) G F} is the language of the words accepted from the state 
q in A and u q (n) (resp. v q (n)) is the number of words of length n (resp. less or 
equal to n) in L q . The maps u q : N — > N are called the complexity functions of A. 
The language L is polynomial if u qo (n) is 0(n k ) for some non-negative integer k 
and exponential if u qo (n) is f2(# n ) for some 9 > 1, i.e., if there exists a constant 
c > such that u qo (n) > c9 n for infinitely many non-negative integers n. 



REPRESENTING REAL NUMBERS IN A GENERALIZED NUMERATION SYSTEM 3 



3. Generalized Abstract Numeration Systems 

If L is a language over a totally ordered alphabet (E,<), the genealogical (or 
radix) ordering < gen over L induced by < is defined as follows. The words of the 
language are ordered by increasing length and for words of the same length, one 
uses the lexicographical ordering induced by <. Recall that for two words x,t/£ E* 
of same length, x is lexicographically less than y if there exist w,x',y' £ E* and 
a, b £ E such that x — wax' , y — wby' and a < b. The lexicographical ordering is 
naturally extended to infinite words. 

Definition 1. A (generalized) abstract numeration system is a triple S = (L, E, < ) 
where L is an infinite language over a totally ordered alphabet (E, <). Enumerating 
the words of L using the genealogical order < gen induced by the ordering < on E 
gives a one-to-one correspondence rep s : N — > L mapping the non-negative integer 
n onto the (n + l)st word in L. In particular, is sent onto the first word in the 
genealogically ordered language L. The reciprocal map is denoted by valg: L — > N 
and for all w £ L, vals(w) is called the S -numerical value of w. 

Compare with [LROlJ, we do not ask the language of the numeration to be 
regular. It is the reason for the introduction of the terminology "generalized" . 

Example 2. Let E = {a, b}, L = {w £ E* : \\w\ a - \w\ b \ < 1}, and S =(L,E,a< 
b). The minimal automaton of L is given in Figure [TJ The first words of the L are 

e, a, b, ab, ba, aab, aba, abb, baa, bab, bba, aabb, abab, abba, baab, . . . 




Figure 1. The minimal automaton of L. 



The following proposition is a result from [LR02] extended to any language. This 
shows how to compute the numerical value of a word in the numeration language. 

Proposition 3. Let S = (L, E, <) be a (generalized) abstract numeration system 
and let A = (Q, qo,T,, 5, F) be a deterministic automaton recognizing L. If w £ L, 
then we have 

\tv\ — 1 

val s (w) = v qo (\w\ - 1) + u go"t»[o,i-i]o(M !)■ 

i=0 a<w[i] 

4. Languages L with Uncountable Adh(L) 

The notion of adherence has been introduced in |Niv78j and has been extensively 
studied in [BN8Q]. 

Definition 4. Let L be a language over an alphabet E. The adherence of L, 
denoted by Adh(L), is the set of infinite words over E whose prefixes are prefixes 
of words in L: 

Adh(L) = {w £ E w | Pref(w) C Pref(i)}. 
Notice that Adh(L) is empty if and only if L is finite. 

For the usual topology on E* U E w , the closure L of a language L over E satisfies 
the equality: L = L n Adh(L). 

The following lemma gives a characterization of the adherence of a language 
BN80]. We give a proof for the sake of completeness. 
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Lemma 5. Let L be a language over an alphabet E. The adherence of L is the set 
of infinite words over E that are limits of words in L: 

Adh(L) = {w G E w | 3(w {n) ) n > a G L N , w {n) -» w}. 

Proof. Take an infinite word w in Adh(L). Then for all n > 0, we have u>[0,n — 
1] G Pref(i). Thus for all n > 0, there exists a finite word z^ G E* such that 
w ( n ) := u;[o,n — l]z( n ) belongs to L. Obviously — > «; and «; belongs to the 
r.h.s. set in the statement. Conversely, take an infinite word u> which is the limit 
of a sequence (w^) n >o of words in L. Then for all ^ > Q, there exists n > 
such that we have w[0,£ — 1] G Pref(iuW) C Pref(L). This shows that u> belongs 
to Adh(L). □ 

The notion of center of a language can be found in [BN80J . 

Definition 6. Let L be a language over an alphabet E. The center of L, denoted 
by Center(L), is the prefix-closure of the adherence of L: 

Center(L) = Pref (Adh(L)). 

The next lemma gives a characterization of the center of a language [BN80 . 
Again we give a proof for the sake of completeness. 

Lemma 7. Let L be a language over an alphabet E. The center of L is the set of 
words which are prefixes of an infinite number of words in L: 

Center(L) = {w G Prcf(L) | w~ 1 L is infinite}. 

Proof. Take a word w in Center(L). By defnition, there exists a infinite word z 
over E such that wz belongs to Adh(i). Then for all n > 0, wz[0, n — 1] belongs 
to Pref(L). Thus for all n > 0, there exists a finite word j/™) G E* such that 
w ( n ) ~ wz[0,n ~ l]j/( n ) belongs to L, and there are infinitely many such words 
w^ n ' . Conversely, let w be a prefix of infinitely many words in L. There exists 
a letter a G E such that wa is a prefix of infinitely many words in L. Iterating 
this argument, there exists a sequence (a n )n>o of letters in E such that wao ■ ■ ■ a n 
belongs to Pref(L) for all n > 0. This implies that wa^oi ■ ■ ■ belongs to Adh(i). 
Hence w belongs to Center(L). □ 

Definition 8. If L is a language over an alphabet E, 

Loo = {w G E" | 3°°n G N, w[0, n - 1] G L} 

denotes the set of infinite words over E having infinitely many prefixes in L. 

Again, observe that L^ is empty if and only if L is finite. 
The following lemma is obvious. 

Lemma 9. For any language L, we have L^ C Adh(L). Moreover, if L is a 
prefix-closed language, then Loo = Adh(L). 

Let us recall two results from jLR02 . 

Proposition 10. Let L be a regular language. The set Adh(L) is uncountably 
infinite if and only if, in any deterministic finite automaton accepting L, there 
exist at least two distinct cycles (pi, . . . ,p r ,pi) and (qi, . . . , q g , q±) where r, s > 2, 
starting from the same accessible and coaccessible state p± = q\. 

Proposition 11. Let L be a regular language. The set Loo is uncountably infinite 
if and only if, in any deterministic finite automaton accepting L, there exist at least 
two distinct cycles (jp\, . . . ,p r ,Pi) and (qi, . . . , q s , qi) where r,s > 2, starting from 
the same accessible state p\ — q\ and such that each of them contains at least a 
final state. 
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It is well known [SYZS92j that the set of regular languages splits into two parts: 
the set of exponential languages and the set of polynomial languages. The polyno- 
mial regular languages over an alphabet E are exactly those that are finite union 
of languages of the form 

(1) xiylx2V2 ■ ■ -x k y* k x k+l 

where k > and the x^s and the j/i's are finite words over E. Consequently, in 
view of Proposition [TQl the following result is obvious. 

Corollary 12. If L is a regular language, then the following assertions are equiv- 
alent: 

• Adh(L) is an uncountable set; 

• L is exponential; 

• Pref(i) is exponential. 

If the considered language is not regular, then only the sufficient conditions of 
Proposition [TOl and Proposition [TT] hold true. They can be reexpressed as follows. 

Proposition 13. //, in any deterministic automaton accepting a language L, there 
exist at least two distinct cycles (pi, . . . ,p r ,p\) and (qi, ■ ■ ■ ,q s ,Qi) where r,s > 2, 
starting from the same accessible and coaccessible state p\ = qx, then the set Adh(L) 
is uncountably infinite and L is exponential. 

Proposition 14. //, in any deterministic automaton accepting a language L, there 
exist at least two distinct cycles (px, . . . ,p r ,Px) and (qx, . . . , q s , qx) where r,s > 2, 
starting from the same accessible state p\ = qx and such that each of them contains 
at least a final state, then the set Loo is uncountably infinite and L is exponential. 

There exist non-regular exponential languages with an uncountable associated 
set Loo, and thus also with an uncountable set Adh(L), that are recognized by 
a deterministic automaton without distinct cycles satisfying condition of Proposi- 
tion [T3J For instance, see Example 03] of Section [S] about the |-number system. 
Notice that the corresponding trim minimal automaton depicted in Figure [H] has an 
infinite number of final states. Note that, by considering automata having a finite 
set of final states, we get back the necessary condition of Proposition fTTl 

Proposition 15. Let L be a language recognized by a deterministic automaton A 
having a finite set of final states. The set Loo is uncountably infinite if and only if 
there exist in A at least two distinct cycles (pi, . . . ,p r ,Px) and (qx, . . . , q s , qx) where 
r, s > 2, starting from the same accessible state px = qi and such that each of them 
contains at least a final state. 

Proof. In view of Proposition [T^Jl we only have to show that the condition is neces- 
sary. Since there is only a finite number of final states, if w € Loo, then there exist 
a final state / and infinitely many n such that qo ■ w[0, n — 1] = /. If A does not 
contain such distinct cycles, then this implies that any word in Loo is of the form 
xy u , where x, y are finite words. Since there is a countable number of such words, 
we would get that Loo is a countable set. The conclusion follows. □ 

Corollary 16. Let L be a language recognized by a deterministic automaton A 
having a finite set of final states. If Loo is an uncountable set, then L is exponential. 

Remark 17. Any deterministic automaton recognizing a non- regular prefix-closed 
language has an infinite number of final states. Indeed, in such an automaton, all 
coaccessible states are final. 

There exist exponential (and prefix-closed) languages L with a countable, and 
even finite, set Adh(L). We give an example of such a language. 
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Example 18. Let L={me {a, b}* | 3u e {a, b}* : w = aL 2 ^u}. We have 

f 27 if n = mod 2. 
«*(»)= | ^ j fnsl mod2 

and Adh(L) = = {a"}. The minimal automaton of L is depicted in Figure [H 



a, b 




CO 

Figure 2. The minimal automaton of L. 



5. Representation of Real Numbers 



In the framework of |LR02j , a real number is represented in an abstract numer- 
ation system built on a regular language L as a limit of a sequence of words of L. 
Observe that in this context, thanks to Lemma[Hl the set of possible representations 
of the considered reals is Adh(£). Therefore, one could consider abstract numera- 
tion systems built on the prefix-language instead of the one built on the language 
itself, see Remark [20] and Remark 1211 This point of view is relevant if we compare 
this with the framework of the classical integer base b > 2 numeration systems. 
Indeed, in these systems, the numeration language is 

C b :={1,2,...,6-1}{0,1,...,6-1}*, 

which is of course a prefix-closed language. Notice that this is also the case for non- 
standard numeration systems like /3-numeration systems and substitutive numera- 
tion systems. Adopting this new framework, we consider only abstract numeration 
systems built on prefix-closed languages. Therefore, to represent real numbers, 
we do not distinguish anymore abstract numeration systems built on two distinct 
languages L and M such that Pref(L) = Pref(M). 

Let 5* = (L, E, <) be a generalized abstract numeration system built on a prefix- 
closed language L. Let A = (Q, qo, E, S, F) be an accessible deterministic automa- 
ton recognizing L. We make the following assumptions: 



Hypotheses. 

(HI) The set Adh(L) is uncountable; 
(H2) Vw <E E*, 3r w > : lim JW+00 2a 



■j (n—\w\) 



»(») 

(H3) Vw G Adh(L), lim^ +00 r w[0 ^_i] = 0. 

Observe that for all w ^ Center(L), we have r w = 0. 
Recall that, since L is a prefix-closed language, we have Adh(L) 
Lemma [HI 



see 



Notation. We set 7'o := r £ and 

s := 1 - r 



lim 



V qo( n - 1) 

v qo (n) 



Remark 19. In LR02 arc considered regular languages L with uncountably infi- 
nite Adh(L) such that, for each state q of a DFA recognizing L, cither L q is finite, or 
u q (n) <~ P q (n)9g where P q € R[X] and 9 q > 1. One can notice that such languages 
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satisfy the hypotheses (HI), (H2) and (H3) above. Indeed, for all states q and all 
t > 0, it can be shown that 

lim u i( n Z Q = a i [^g° Z jj 

rw + oo w 90 (n) 

where go > 1 and a g := lim„^ +oc . Since Q is finite, this is sufficient to 

verify our assumptions. Notice also that for the integer base b numeration system, 
the three hypotheses are trivially satisfied. 

We shall represent real numbers by infinite words w of Adh(L) by considering 
the corresponding limit 

(2) lim Valg(w[ °; U - 1]) . 

n^ + oo V qo {n) 

Our aim is to show that for all w € Adh(L), the limit @ exists, see Proposi- 
tion 

Remark 20. If the considered abstract numeration system is built on a language 
that is not prefix-closed, we cannot guarantee that the limit (|2|) exists. Consider 
for instance the abstract numeration system built on the language L of Example [H 
which is not prefix-closed. The sequences ((a6)™)„>o and ((ab) n a) n >o of words 
in L converge to the same infinite word (ab) u , but the corresponding numerical 
sequences do not converge to the same real number. More precisely, using notation 
of Example [2j we have 

(3) lim ^» = f and lim = |, 

ra— >+oo vo{2n) 4 n-i+oo vo{2n + 1) 5 

so that the limit 

valg((abr[0,n-l]) 
hm — 

does not exist. This essentially comes from the staircase behaviour of (wo(?i))n>o- 
We have that for all n > 0, 

u (n) -- 



(£) if n = mod 2, 
2fn-i) ifn=l mod 2. 



This implies in particular that lim„^ +00 ~~7^y^ does not exist. Indeed, using 
Stirling formula and |Bou07[ Ch. V.4, Prop. 2], we have 

8 i 5 i 
(4) v (2n) ~ — -=n~H n and v (2n - 1) ~ — -=n^H n (n -> +oo). 

3V7T 3v7r 

Hence, 

.. «d(2n-l) 5 w (2n) 2 

lim — — — — = — and lim — — — = — . 

n— >+oo VQ(2n) 8 n— >+oo vo(2n + 1) 5 

By Proposition [3J we obtain that for all n > 1, 

val s ((a6)") = «p(2n-l) | E^^^gj) 
t>o(2n) i>o(2n) i>o(2n) 

val s ((a6)"a) _ u (2n) E^^t^ + l) 



w (2n+l) w (2n + l) w (2n+l) 
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Using again Stirling formula, we get 

t*a(2t) = (- 2 \) ~ ^ r ^ 4 ^ ^ + °°' ) ' 
u 2 {2i + l)=[ . l + ( J~ « .4*^+00). 
Therefore, by |Bou07( Ch. V.4, Prop. 2] and in view (JU), it follows that 

ton S^!)=i and ton ^(M+l) 1 
n— >+oo t>o(2n) 8 n— »+oo ti (2n + 1) 5 

and we obtain the limits of 

Remark 21. Considering prefix-closed languages not only avoids numerical con- 
vergence problems as in Remark 1201 but also permits to get rid of problems arising 
from languages L such that there is infinitely many n for which L n E" = as 
discussed in LR02, Remark 4]. 

Definition 22. If w G Adh(L) is such that lim„^ +oc vals [ ) "'^"~ 1 ^ = x, we say 
that w is an S -representation of 

Example 23. Consider the abstract numeration system built on the Dyck language 
that will be described in Examplc l42l Table [T] gives some numerical approximations. 
We will see further that lim„^ +00 vai 3 ((«.a6r[o,n-i]) = 39 = 0.79592 .... 

v qQ\ n ) ^ y 



w 


vals(w) 


« 90 (H) 


vals(«>) 
«9n(M) 


a 


1 


2 


0.50000 


aa 


2 


4 


0.50000 


aab 


5 


7 


0.71429 


aaba 


9 


13 


0.69231 


aabaa 


17 


23 


0.73913 


aabaab 


32 


43 


0.74419 


aabaaba 


60 


78 


0.76923 


aabaabaa 


112 


148 


0.75676 


aabaabaab 


213 


274 


0.77737 


aabaabaaba 


404 


526 


0.76806 


aabaabaabaa 


771 


988 


0.78036 


aabaabaabaab 


1479 


1912 


0.77354 


aabaabaabaaba 


2841 


3628 


0.78308 


aabaabaabaabaa 


5486 


7060 


0.77705 


aabaabaabaabaab 


10591 


13495 


0.78481 



Table 1. Some numerical approximations. 



Notice that for all w G Adh(L), we have vals(u;[0, n— 1]) G [v qo (n—l),v qo (n) — i\ 
for all n > 1. Therefore, the represented real numbers x must belong to the interval 
[so,l]- 

Like in [LR02] , we divide [so, 1] into subintervals I y , for all prefixes y of infinitely 
many words in L. For each I > 0, Center(L) n S £ is the set of words of length I 
which are prefixes of infinitely many words of L. For each y G Center(L) n and 



REPRESENTING REAL NUMBERS IN A GENERALIZED NUMERATION SYSTEM 9 




n > £ > 0, define 



v qo {n- 1) 

^90 W 



E 



x<y 
xGCenter(L)nE^ 



u qa . x {n - I) 
v qo {n) 



and 



^90 0) 



Then, in view of Hypothesis (H2), for all y G Center(L) n E^, we can define the 
limit interval 



Z y := lim I Vtn = [ay,a y +r y ], 

n — >-\-oo 



where 



OL y := lim a y _ n = s + 

n— >+oo 



E 



z<2/ 
a;eCenter(L)nS f 

Moreover, we set I y := for all y e L \ Ccntcr(L). From [LR02J, we know that for 
all £ > 0, we have 

[so, 1] = (J r v 

2/eCenter(L)nS <! 

and for all 
(5) 



More precisely, if ai, . . . , Ofc are the letters of E and if a\ < • • • < a^, then for all 
y € Center(L) and all j £ [1, k] such that yaj £ Center(L), one has 

i-i j 
(6) 7 yQj = a y + J2r yaz ,a y + J2r yai 

i=l i=l 

Remark 24. Let y, z be words in E* such that yz el. If y is prefix of infinitely 
many words in L and if \z\ is large enough so that every word of length \yz\ has a 
prefix in Center(L) n E^l, the n we have 
(7) 

— , lyzhl — , 

v&Ls{yz) = v qo {\yz\-l)+ u ?o-*(M)+ E E u io-x(\yz\-i-l). 

X <V i=\y\ x<yz[0,i] 

x£Center(L)n£ lyl \x\=i+l 
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Lemma 25. Let w £ Adh(L). For all £>0, w[0,£- 1] belongs to Center(L) n S f 
and the limit 



lim a 

t 

exists. 



u>[0,£-ll 

+00 



Proof. The first part is obvious since w[0,£ — 1] is a prefix of w[0,n — 1] for any 
n > £, see Lemma[71 For the second part, on the one hand, observe that {5} implies 
that for all I > 1, awo,^— l] < awo^i- On the other hand, we have also that for 
all £ > 1, ck^o^-iI < 1- Hence, {a w [o,e-i])i>i is a bounded and non-decreasing 
sequence, so it must converge. □ 

Notation. For all w £ Adh(L), a w :— lim^ +oc ce w ipe—i\- 

Note that we have a m > a^jo^— l] f° r all ^ > 1- 

Proposition 26. For all w £ Adh(L), we have 

val s (w[0,n - 1]) 
hm — = a w . 

«^+oo v qa (n) 

Proof. Let w £ Adh(L). For all £ and n such that n > £ > 1, we have 

/q\ val s (w[0,n- 1]) Mg -«[o,^-i](«-^) 

(8) o^-l]* < ^ < ^[o^-U.n + ^ ■ 

Let e > 0. For all £ > 1, there exists N(£) > I such that for all n > N(£), we have 

e vals(w[0, n — 1]) £ 
a iu[o^-i] — ^ < < °iw[o,i-i} + r w[oj~i] + ~ ■ 

* Vq a (n) I 

By Hypothesis (H3) and Lemma 1^51 there exists also k £ N such that for all £ > /c, 

£ 

r ™[o,^i] < 2 
It follows that for all n > N(k), 



r w[o,e-i] < | an d < a w - a m|0)M ] < |. 



£ vals(iu[0, n—1]) 

a™ - £ < a^o.k-i] - x < < a w + e 

l v qo (n) 

and the conclusion follows. □ 

The preceding proposition allows us to define the S- value of an infinite word 
in Adh(i). 

Definition 27. The application valg : Adh(L) — > [so,l]: w i— > a w is called the 
S-value function. 

Proposition 28. Ifw,z£ Adh(L) are such that w is lexicographically less than z, 
then vals(u>) < vals(z). 

Proof. Let w, z £ Adh(L). We deduce from fg} that if k := ini{i £ N \ w[i] < z[i]}, 
then W > k, we have Qwo^_i] < Q^rcM— 11 an d the proposition holds. □ 

Recall now a result from [BB97] . 

Lemma 29. If K is an infinite language over a totally ordered alphabet, then 
Adh(A') contains a minimal element for the lexicographical ordering. 

This leads to the following definition. 

Definition 30. For all y £ Center(L), m y (resp. M y ) denotes the least (resp. 
greater) word in Adh(L) in the lexicographical ordering having y as a prefix. 
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Notice that for all y £ Center(L), we have m y — wv (resp. M y — wu), where u 
(resp. v) is the minimal (resp. maximal) word in Adh(y L) for the lexicographical 
ordering. 

Example 31. Continuing Example [521 we have m aa \, — aaba^ and M aa b = 
aabb(ab) w . 

Lemma 32. For all y E Center(L), one has 

vals{m y ) — OL y and vals(M y ) = a y + r y . 

Proof. Let y G Center(L). From ©, we get that for all i > \y\, a m [o^-i] = a y 
and ajf s [o,n] + ^Af H [o.f-i] = a y + r y Therefore, we obtain that for all i > \y\, 

a y <val s (m y ) < a y + r my[0/ _ 1] , 
ay + r y - r Ms [ /_i] < val s (Mj,) < a y + r y . 
We conclude by using Hypothesis (H3). □ 
Proposition 33. The S-value function is uniformly continuous. 

Proof. Let w, z e Adh(L). Assume that d(w, z) = 2~ l . Then w[0,£- 1] = z[0,£- 1] 
and, in view of Lemma 1321 the 5-values vals(w) and vals(z) belong to i«,[o,^-i]- 
Thus | vals(w) — vals(z)| < r^ro^-i] — > as I — > +oo by Hypothesis (H3). The 
conclusion follows. □ 

Using Lemma 1321 we are able to give an expresssion of the S- value of a word in 
Adh(L). 

Proposition 34. For all w £ Adh(L), 

+oo 

ValsO) = S + ^ ^ r w[0,i-l]a- 
i=0 a<w[i] 

Proof. Let w € Adh(L). Using ([6|), we get that for all n > 1, 

x<w[0,n-l] 
a;GCenter(L)nS" 

n-1 

= So + Yl r w[0,i-l]ay 
i—0 a<w[i] \y\— n — i — 1 
n-1 

= «0 + ^2 ^2 r «'[0,4-l]a- 
i=0 a<w[i] 

Letting n tend to infinity in the latter equality, we get the expected result. □ 

The following proposition links together the framework of [LR02] . where are 
mainly considered converging sequences of words, and the framework that has been 
developed in the present section to represent real numbers. 

Proposition 35. Let K be a language over a totally ordered alphabet (£, <) such 
that its prefix-closure Pref(-fT) satisfies Hypotheses (HI), (H2), and (H3), and 
let S = (Pref(iT), £, <) be the abstract numeration system built on Pref(if). If 
(w^ ) n >Q € K N is a sequence of words such that — > w, then we have 

val s (ibW) 
hm — r-^— = a w . 

n— >+oo V qa (\w^ n >\) 



12 E. CHARLIER, M. LE GONIDEC, AND M. RIGO 

Proof. Let (w (n) )„>o G be a sequence of words such that uA n ) — > u>. Thanks 
to Lemma [5J this implies that Pref(u>) C Pref(.K'). For any £ > 1, there exists 
N(£) > £ such that for all n > N(£), w^[0,£- 1] = w[0,£ - 1]. Then in view of 
and ©, for all £ > 1 and for all n > N(£), we have 



val s (w[0, \w n \ - 1]) vais (w (7l) ) 



9oO"l) 



< " go ^[0^-l](k"| -£) 



Let s > 0. By Hypothesis (H2), for all £ > 1, there exists M{£) > £ such that for 
all n > M(£), 

u qo . w [o,e-i]{\w n \~ £) e 

Mi4 <Mo ' M1 + 2- 

By Hypothesis (H3), there exists k € N such that for all £ > k, r w [ 0e _ 1 ] < |. Then 
for all n > max(iV(fc), M(k)), we have 



val s HO, \w n \ - 1]) val s 



-</o 



(l^"l) ^o(k n l) 



< e. 



□ 



To conclude this section, we recall some results from [BB97j interesting for our 
study. 

Proposition 36. If K is an infinite algebraic language over a totally ordered alpha- 
bet, then the minimal word of Adh(if) is ultimately periodic and can be effectively 
computed. 

Definition 37. Let if be a language over a totally ordered alphabet. The minimal 
language of K, denoted by mm(K) is the language of the smallest words of each 
length for the lexicographical ordering: 

min(K) = {w € K \ Vz £ K, \w\ = \z\ =4> w <i ox z}. 

Proposition 38. If K is an infinite language such that K — Center(if) 7 then we 
have vam{K) = Pref(m e ). 

Corollary 39. If K is an infinite algebraic language such that K = Center(K), 
then Pref(m e ) is a regular language. 

Of course, all these results can be adapted to the case of the maximal word of 
the adherence of a language. 

Transposed to the context of this paper, these results can be related to synctatical 
properties of the cndpoints of the intervals I y , for y G Center(L). 

Corollary 40. Assume that the language L is algebraic. Then for ally G Center(L), 
the infinite words m y and M y are ultimately periodic. 

Notice that in general, there exist ultimately periodic representations that are 
not endpoints of any interval I y , where y G Center(L). For instance, in the integer 
base 10 numeration system, we have that the representation of g is 0.33333 • • • 
and | is not the endpoint of any interval of the form [-^r, ^jjr] , where £ > 1 and 
fcG[0,10 f -l]. 

6. Applications 

In this section, we apply our techniques to three examples to represent real 
numbers in situations that were not settled in |LR02j . The first one shows how it 
can be easier to consider the prefix-closure of the language instead of the language 
itself. 
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Example 41. Consider again the language L — {w £ {a, b}* | \\w\ a — \w\b\ < 1} 
of Example [H This language is not prefix-closed. We have Pref(i) = {a, b}*, 
which is of course a regular language. For the abstract numeration system S = 
(Pref (L), {a, b}, a < b), the hypotheses (HI), (H2) and (H3) are trivially satisfied. 
More precisely, for all w € {a, b}*, we have r w = 2~l" , '~ 1 . Using the same notation 
as in Example [21 we have 

lim Vo{n ~ 1} = I 
n— >+oo vo(n) 2 

Therefore, we represent the interval 1]. For all t > 1, Center(L) n S £ = {a, b} e 
and the intervals corresponding to words of length I are exactly the intervals 
[£, for any ke [0,*- lj. 

The second example illustrates the case of a non-regular language with a non- 
regular prefix-language. 

Example 42. The Dyck language is the language 

D := {w G {a,b}*\ \w\ a = \w\ b and Vu € Pref(w), |u| 6 > |u| } 

of the well-parenthesized words over two letters. Its (infinite) minimal automaton 
Ad = {Q, lOi {a, b}, 5, {qo}) is represented in Figured] For each m > 0, define 
d m = (a m )~ 1 D = {w G {a,b}*\a m w £ D} and d_i = 0, so that Q = {d m \m > 
0} U {d-i}. Notice that in Figure H] the states d m are simply denoted by m. 




a,bQQ 



Figure 4. The minimal automaton of D. 
It has been proved in [LG08| that for all m > 0, 

{0 if n < to or m ^ n mod 2, 

^r(n-m) if 7i > to and to = 7i mod 2. 

By Stirling's formula, we get that for all m > 0, 

(9) u d2 (2n) ~ 2m 1 ti~^4" (n -> +oo), 

v 7r 

(10) ll rf2 ,,(211+1) ~ 2 ( 2W + 2 ) n -f 4 n ( n ^ +00 ). 

The Dyck language is not prefix-closed. Hence we consider the abstract numer- 
ation system S = (P, {a, b}, a < b) built on the language 

P := Pref(D) = {w e {a,6}*|Vu G Pref(TTj), \u\ b > \u\ a } 

of the prefixes of the Dyck words. The (infinite) minimal automaton of P is Ap = 
(Q,qo,{a,b},5, F). It is represented in Figure [5j Since the minimal automaton 
Ap of P and the minimal automaton Ad of D are nearly the same, we rename 
the states of Ap by p m :— d m . Hence the Ud m 's denotes the complexity functions 
of Ad and the u Pm 's denotes the complexity functions of Ap. By Proposition flUl 
Adh(P) = Adh(D) is uncountable and Hypothesis (HI) is satisfied. 
Observe that for all m > 0, 

2™ if 71 < m, 

2u Pm (n — 1) — Ud m (n — 1) if n > m. 



Pi 
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\ a a a Q 

k -o. -o. -a 

b b b b 



a,bQQ 



Figure 5. The minimal automaton of Pref(D). 



Hence we get that for all to > 0, 

J 2™ if n < m, 

Upm I 2" - Er=m "dm 2 "^ 1 if » > 

We have that for all m > 0, 

(11) w Pm (2n) ~ ^^n-h n (rw+oc), 



(12) 
(13) 



u p (2n + l)~v p (2n) ~ 2 ( TO + jl n -?4" ( n _» +oo), 

V 71 " 

4(to + 1) i „ . 
v Pm (2n + 1) v n M" (n -> +oo). 



We prove only (fTT|) since the same techniques can be applied to obtain (fT2| and 
(TT3]). Let us first show that for all to > 0, we have 



+oo 



(14) 



Y^u d2m {2i)±- 1 = 2 and ^ u d2m+1 (2i + 1) 4" 1 = 4. 



We compute only the first sum, the second one can be treated in similar way. In 
view of © and |Bou07l Ch. V.4, Prop. 2], for all to > 0, we have 



2m + 1 



and the series 



E u **»( 2 *) 4 " 



n — > +oo) 



is convergent. Consequently, for all to > 0, the series 

+oo 

E u d2m {2i)z i 

i=m 

is uniformly convergent over {z€C||z|<i} because for all q > p > to, we have 



sup 

M<3 



i—p 



Then observe that for all m > and i > m such that i = m mod 2, we have 

m 

u dm (i) = Card{u; (0) 6u; (1) 6---6w (m) | Vj € [0,to], u> w e L>, El w ° } l = * - m ) 



e 



'+00 

E c « 



3=0 
m+1 
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where C n := u do (2n) = jJ+fC^ 1 ) is the nth Catalan number |GKP94j and [z n ]f 
is the coefficient of z n in the power series /. It is well known that 

+00 



53 C » = 



1 - V! - 4i 



for \z\ < j. Hence we get that for all m > 0, 

+00 / + OO 

5> d2m (2z)z l = z™ ^C„z' 

i= rn \n— 



2z 



2 m +1 



(1- yT^) 
2 • 4 m 2; m +i 



2m+l 



Therefore, we obtain the desired first sum of (fT4")l by letting z tend to ^ in the 
corresponding formula. We now come back on pip . For all < m < n, we have 

-. n — 1 -. + 00 

Up2m (2n) = 4™ - - (2*) 4- 1 = ? 4" £ u , 2m (2i) 4- 



and 



1 n— 1 +00 

> (2n) =4"-T u d2m+1 (2i + 1) 4— = - 4" £ Ud2m+1 (2t + 1) 4- 



Notice that 53<t^ * ' ~ 2n a . Finally we obtain that for all m > 0, 



2m + 1 



n"4" 



and 



2m + 2 



proving (|TTj) . 

Let us now verify that the language P satisfies our three hypotheses. From the 
previous reasoning, we get that for all m > and all I > 0, 

u Pm (n — £) 



lim 

to — >+oo 



(m+l)2~ 



For all w € P, r w := (m«, + 1) 2~''"''~ 1 where m w is defined by po ■ w = Pm w and 
for all w P, r w := 0. Hence Hypothesis (H2) is satisfied. Let now w E Adh(P). 
Observe that m w rp t £_x\ < i for all £ > 1. Therefore, for all w £ Adh(D), we have 
r w[o,e-i] < + 1)2 _£_1 — * as I — > 00 and Hypothesis (H3) is satisfied. 
Since 

1 

2' 



lim V ^ n - l) 



n^ + cc v po (n) 

we represent the interval 1]. We have Center(D) n = P n {a, 6}^. Any word 
of P begins with a, so that J a = [h, 1]. We have Center(P) HE 2 = {aa, a&} and J a 
is partitioned into two subintervals: 



Inn. 



1 7 

2' 8 



and lab 



1 



Then Centcr(P) fl S 3 = {acta, aa&, a6a}. Thus I a (, = / a fc a and I aa is partitioned 
into two new subintervals 



1 


3" 




'3 7" 






2' 


4_ 




4' 8_ 


3 laba 





laaa 

Then Center(Z)) HE 4 = {aaaa, aaab, aaba 1 aabb, , abaa, abab} and we get 

laaaa ~ 



"1 21" 




"21 3" 




2' 32 


? laaab ~ 


32' 4 


1 laaba 


"27 7" 




"7 31" 




32' 8 


! labaa " 


8' 32 


1 * abab 



3 27 

4' 32 



51 1 

32' 
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As stated by CorollarvHUI since the language D is algebraic, for all y <E Center (.D), 
the representations of the endpoints of the interval I y are ultimately periodic. 
Let Q x denotes the set of all the representations of x. We have Qi = {a w } 
and Qi = {(ab) 1 ^}. Now let x S (|, 1) be an endpoint of some interval, i.e., 
x = mil w = sup/j for some w,z S Center(Z?) n E^ with i > 0. We have 
Qx = {i5(a6) w , za^}, where m) is the smallest Dyck word having w as a prefix. 

The third example illustrates the case of a generalized abstract numeration sys- 
tems generating endpoints of the intervals I y having no ultimately periodic S- 
representations. It also shows that our methods for representing reals generalize 
the ones involved to represent reals in 
rational base number systems as well. 



the ones involved to represent reals in the |-number system and by extension the 



Example 43. Consider the language L := Ls recognized by the deterministic 
automaton A = (N U { — 1}, 0, {0, 1, 2}, 8, N) where the transition function 5 is de- 
fined as follows: S(n,a) — ^(3n + a) if n S N and a £ {0,1,2} are such that 
i(3n + a) € N and <5(n, a) = —1 otherwise. This language has been introduced 
and studied in AFS08 . In particular, it has been shown that the automaton A is 
the minimal automaton of L, that L is a non-algebraic prefix-closed language and 
that Adh(L) is uncountable. Moreover, no clement of Adh(L) is ultimately peri- 
odic. The corresponding trim minimal automaton is depicted in Figure where 
all states are final. 




Q^Qjy(^Q^(22;(23)(24)(25) 

Figure 6. First levels of the trim minimal automaton of L; 



Let (G„)„>o be the sequence of integers defined by: 
G = 1 and \fn E N, G n+ i 

From [AFSOSj . we find 

Uo(0) = 1 and Vn e N, u (n + 1) = G„+i - G n . 



2 Gr ' 
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It has been shown in [AFS08] that for all n > 0, G n = LA (§)"_!, where K := 
K(3) = 1.6222705 • • • is the constant discussed in |OW91( IHHjJ7] ISte03j . Consider 
now the abstract numeration system S = (L,{0, 1,2},0 < 1 < 2) built on this 
language. From [AFS08] . we know that for all w € L, 

M-i /oN \w\-i- 

^ i=0 



vals(w) = 2^2 W M \ ~ 

i=0 

Consequently, for all w £ Adh(L), we have 

_^ +! . 



3 A f-( V 3 

i=0 v 

Now let us verify that L satisfies Hypothesis (H2) and (H3). Recall that, for all 
x S L, M x (resp. m x ) denotes the maximal (resp. minimal) word in Adh(L) for 
the lexicographic ordering having prefix. We have that, for all x G L, 

r x = \I x \ = vals (-Mx) - vals(m K ) 

1 + °° /2 
= 3^ ^2{M x \i]-m x [i\) I - 

i=\x\ 



5f(§) ^ (M - [i 



a: -"*** + M) o >0 



i=0 



and Hypothesis (H2) is satisfied. For all x £ L, since M x [i] — m x [i] < 2 for all i > 0, 
we obtain from that 

J"* < -^r I ~ 1 — as .rj 



A V3 

Therefore, if io e Adh(L), then limg^ +aD w[0,£ — 1] = and Hypothesis (H3) is 
also satisfied. 



Open poblems 



Find a necessary condition on any automaton recognizing a language L so 
that the corresponding oj-language Adh(L) is uncountable. 
Let £>2 be the Dyck language for two kinds of parentheses. It is well- 
known that for every algebraic language L, there exists a faithful sequential 
mapping / such that /(Adh(D 2 )) = Adh(/(£> 2 )) = Adh(L), see [BN80I 
Theorem 6] for details. Let S and T be abstract numeration systems built 
respectively on Pref(Z?2) and Pref(L). Give a mapping g such that the 
following diagram commutes. 



Adh(D 2 ) 



/ 



Adh(L) 



va.K 



vslq 



[to, 1] 
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